{smcl}
{* *! version 2.01  17oct2016}{...}
{hline}
help for {hi:xtdpdml} version 2.01
{hline}


{title:Dynamic Panel Data Models using Maximum Likelihood}


{marker syntax}{...}
{title:Syntax}

{p 8 16 2}
{opt xtdpdml} y [time-varying strictly exogeneous vars]
   [{cmd:,} {it:inv(time-invariant exogenous vars)} {it:pre(predetermined vars)} {it:other_options}]

{synoptset 20 tabbed}{...}
{synopthdr}
{synoptline}
{syntab:Independent variables (other than strictly exogenous)}
{synopt :{opt inv(varlist)}}Time-invariant exogenous variables, e.g. year of birth{p_end}
{synopt :{opt pre:det(varlist)}}Time varying predetermined (sequentially exogenous) variables {p_end}
{synopt :{opt ylag:s(numlist)}}Specifies lagged values of y to be included in the model. Default is lag 1. {p_end}

{syntab:Dataset options}
{synopt :{opt wide}}Data are already in wide format (default is long format with xtset preceding the command){p_end}
{synopt :{opt stayw:ide}}Keep data in wide format after execution. May help with
some sem post-estimation commands, e.g. predict.{p_end}
{synopt :{opt tfix}}Recode time variable to equal 1, 2,..., T (number of waves). Set delta = 1.{p_end}
{synopt :{opt std}}Standardize all variables in the model to have mean 0 and variance 1 (in long format) {p_end}
{synopt :{opt std(varlist)}}Standardize specified variables to have mean 0 and variance 1{p_end}

{syntab:Model Specification and Constraints Options}
{synopt :{opt evars}}When there are no predetermined variables in the model this sometimes helps with convergence{p_end}
{synopt :{opt alphafree}}Allow Alpha (fixed) effects to vary across time{p_end}
{synopt :{opt xfree}}All x effects free to vary across time{p_end}
{synopt :{opt xfree(varlist)}}x effects of specified variables free to vary across time{p_end}
{synopt :{opt yfree}}All lagged y effects free to vary across time{p_end}
{synopt :{opt yfree(numlist)}}effects of specified lagged ys free to vary across time{p_end}
{synopt :{opt constinv}}constrains constants to be equal across waves. Alias for {it:nocsd}{p_end}
{synopt :{opt nocsd}}Cross-sectional dependence is NOT allowed. Alias for {it:constinv}{p_end}
{synopt :{opt errorinv}}constrains error variances to be equal across waves. May cause convergence problems{p_end}
{synopt :{opt re}}Random Effects Model (Alpha uncorrelated with Xs){p_end}

{syntab:Reporting}
{synopt :{opt ti:tle(string)}}Gives a title to the analysis, e.g. {it: ti(Baseline Model)}{p_end}
{synopt :{opt detail:s}}shows all the sem output + highlights. Otherwise you
  only get highlights.{p_end}
{synopt :{opt show:cmd}}show the sem command generated by xtdpdml{p_end}
{synopt :{opt gof}}report several goodness of fit measures{p_end}
{synopt :{opt tsoff}}do not use time-series notation in the highlights output{p_end}
{synopt :{it:{help estimation options##display_options:display_options}}}Assorted display options, e.g. noci,
vsquish, cformat{p_end}
INCLUDE help shortdes-coeflegend

{syntab:Other options}
{synopt :{opt mp:lus(fname, opts)}}Create Mplus inp and dat files. File may need some editing before running.{p_end}
{synopt :{opt semf:ile(fname, r)}}Create do file with the generated sem commands{p_end}
{synopt :{opt dry:run}}Do not actually estimate the model.{p_end}
{synopt :{opt iter:ate(#)}}Maximum number of iterations allowed. Default is 250.{p_end}
{synopt :{opt tech:nique(options)}}Estimation technique used. Default is {it: nr 25 bhhh 25}.{p_end}
{synopt :{opt semopts(options)}}Additional sem options to be included in the generated sem command.{p_end}
{synopt :{opt fiml}}Full Information Maximum Likelihood is used for missing data.{p_end}
{synopt :{opt v12}}Lets xtdpdml run under Stata 12.1. Probably ok but use at own risk.{p_end}
{synopt :{opt skipcfa:transform}}Changes the way start values are computed in Stata 14.2 and later.{p_end}
{synopt :{opt skipcond:itional}}Changes the way start values are computed in Stata 14.2 and later.{p_end}
{synopt :{it:{help maximize:maximize_options}}}control the maximization process; seldom used{p_end}

{synoptline}

{p 4 6 2} Factor variable notation is NOT supported.{p_end}
{p 4 6 2}
{it:Strictly exogenous} and {it:predetermined} variables may contain time-series operators; see {help tsvarlist}.{p_end}
{p 4 6 2}
Many/most sem postestimation commands will work after xtdpdml.
See {manhelp sem_postestimation R:sem postestimation} for features
available after estimation.  You may need to use {it:staywide} to get some options to work. {p_end}


{marker description}{...}
{title:Description}

{pstd} {cmd:xtdpdml} fits Dynamic Panel Data Models using Maximum
Likelihood. It basically works as a shell for {it:sem}, generating the
necessary {it:sem} commands. It can also generate code for running these models
in Mplus. It tends to work best when panels are
strongly balanced,  T is relatively small (e.g. less than 10), and there
is no missing data. See the section on Special Topics below for suggestions
on what to do if your data do not meet these criteria,

{pstd} Panel data make it possible both to control for unobserved
confounders and to include lagged, endogenous regressors. Trying to do
both at the same time, however, leads to serious estimation
difficulties. In the econometric literature, these problems have been
solved by using lagged instrumental variables together with the
generalized method of moments (GMM). In Stata, commands such as xtabond
and xtdpdsys have been used for these models. 

{pstd} xtdpdml addresses the same problems via maximum likelihood
estimation implemented with Stata's structural equation modeling (sem)
command. The ML (sem) method is substantially more efficient than the
GMM method when the normality assumption is met and suffers less from
finite sample biases. xtdpdml simplifies the SEM model specification
process; makes it possible to test and relax many of the constraints
that are typically embodied in dynamic panel models; unlike most related
methods, allows for the inclusion of time-invariant variables in the
model; and takes advantage of Stata's ability to use full information
maximum likelihood (FIML) for dealing with missing data. xtdpdml also provides
an overall goodness of fit measure by default and provides access to others 
via the sem postestimation command {cmd:estat gof, stats(all)}. Many other
sem postestimation commands can be used as well. Since xtdpdml is a shell
for sem, you should use the {cmd:sem} command if you want to replay results.

{pstd} {it:Data should be xtset with both the panel id and time variable specified.} 
The time variable should be coded t = 1, 2, 3, ...,
T, and delta (the period between observations) should equal 1.  Other
values for t (e.g. years, or starting at 0, or skipped values of t) will
likely produce error messages or incorrect results. If necessary, recode
the time variable before running xtdpdml. Or, you can use the {it:tfix}
option and let xtdpdml recode the time variable for you (but you can
still get errors if, say, delta was not specified correctly in the
source data set, e.g. data were collected every two years and delta was
set to 1). The model assumes that time intervals are equally spaced.

{pstd} {it:All variable names should start with lowercase letters.} 
As the Stata sem manual points out, "In the command language, 
variables are assumed to be observed if they are typed in lowercase and 
are assumed to be latent if the first letter is capitalized. 
Variable educ is observed, while variable Knowledge or KNOWLEDGE is 
latent. If the observed variables in your dataset have uppercase names, 
type {cmd:rename all, lower} to convert them to lowercase."

{pstd} By default, most effects (with the exceptions of the constants and error variances) are
constrained to be equal across waves, making it possible to present only a single set
of parameter estimates for each variable in the model. These constraints can be relaxed
via options such as {it:xfree}, {it:yfree} and {it:alphafree}.

{pstd} The models include a latent variable ALPHA that reflects the fixed effects that are
common to all time periods. By default, The coefficient of
ALPHA is constrained to have a value of 1.0 at each time period. The alphafree option can
be used to allow the effects of ALPHA to vary across waves. Also by default, ALPHA 
freely covaries with the time-varying exogenous variables. If {it:re} is specified, 
a random effects model is estimated where ALPHA is uncorrelated with all of the X
variables. 

{pstd} The are FOUR types of independent variables that can be
specified. There is considerable flexibility in specifying which lagged
values of variables (if any) should be included in the model, e.g. no
lags or heterogeneous lags can be specified.

{p 6 6 2} The lag 1 value of y (e.g. L1.y) is included by default. This can be changed
with the {it:ylag} option.

{p 6 6 2} Strictly exogenous variables are those that (by assumption) are uncorrelated with
the error terms at all points in time.  Equivalently, we assume that
they are not affected by prior values of the dependent variable. These variables
are specified on the left side of the comma, before the options. Time series
notation can be used, e.g. {it: xtdpdml y L1.wages L2.wages} would include the first
and second lagged values of wages as independent variables.

{p 6 6 2} Predetermined variables, also known as sequentially exogenous, are
variables that can be affected by prior values of the dependent
variable. Time series notation can be used. These are specified with the {it:pre} option.

{p 6 6 2} Time-invariant exogenous variables are variables whose values are constant
across time, such as year of birth. You of course DO NOT use time series
notation with these. The ability to use time-invariant exogenous variables in the
model is one of the key advantages of the sem approach. These are
specified with the {it:inv} option. These variables are assumed to be
uncorrelated with ALPHA.


{marker options}{...}
{title:Options}

{dlgtab:Independent variables (other than strictly exogenous)}

{phang}
{opt inv(varlist)} Time-invariant exogenous variables, e.g. year of birth. {p_end}

{phang} {opt predet(varlist)} Predetermined variables, also known as
sequentially exogenous. Predermined variables can be affected by prior
values of the dependent variable. Time series notation can be
used.{p_end}

{phang}
{opt ylag(numlist)} By default the lag 1 value of y is included as an independent variable.
Different or multiple lags can be specified, e.g. ylag(1 2) would include lags 1 and 2 of y.
ylag(0) will cause no lagged value of y to be included in the model.{p_end}

{dlgtab:Dataset Options}

{phang}
{opt wide} By default, data are assumed to be xtset long with both time and panelid
variables specified. The data set is temporarily converted to wide format for use with sem.
If data are already in wide format use the {it:wide} option. However, note that the file
must have been created by a reshape wide command or else it won't have information
that xtdpdml needs. Use of this option is generally discouraged.
{p_end}

{phang} {opt staywide} This will keep the data in wide format after
runinng xtdpdml. This may be necessary if you want to use post-estimation 
commands like predict.
{p_end}

{phang} {opt tfix} Time should be coded t = 1, 2, ..., T where T =
number of waves. By default, units like years (e.g. 1990, 1991,) will
cause errors or incorrect results. There will also be errors or
incorrect results if delta does not equal 1, e.g. t = 1, 3, 5. The tfix
option will recode time to equal 1, 2, ..., T and set delta = 1. You can
still have problems though if delta was not specified correctly in the
source data set or if interval width is not consistent. It is safest if
you correctly code time yourself but tfix should work in most cases.
{p_end}

{phang} {opt std} std standardizes all the variables in the model
to have mean 0 and variance 1. It does this while the data set is still
in long format. You probably will not want to use this option in most cases
but it can sometimes help when the model is having trouble converging.
Does not work if the {opt wide} option has been specified, i.e data 
are already in wide format.{p_end}

{phang} {opt std(varlist)} standardizes only the selected variables to
have mean 0 and variance 1. Does not work if the {opt wide} 
option has been specified. Do NOT use time series notation; just
list the names of the variables you want standardized.{p_end}

{dlgtab:Model Specification and Constraints Options}

{phang} {opt evars} sometimes helps with convergence when there are no
predetermined variables in the model. It is an alternative and usually
less efficient way of specifying the error terms. But sometimes it helps
and may be necessary for replicating results from earlier versions of
the program. {p_end}

{phang} 
{opt alphafree} alphafree lets the Alpha (fixed) effects differ across
time. Note that, if this option is used, Alpha will be normalized by
fixing its variance at 1; otherwise the model sometimes has convergence problems.
{p_end}

{phang} {opt xfree} xfree lets the effects of all the independent
variables (except lagged y) freely differ across time. {p_end}

{phang} {opt xfree(varlist)} lets the effects of the specified
independent variables freely differ across time. {p_end}

{phang}
{opt yfree} lets all lagged y effects freely differ across time.
{p_end}

{phang}
{opt yfree(numlist)} allows the specified lagged y effects to freely differ across time.
{p_end}

{phang} {opt nocsd} (alias is {opt constinv}} Cross-sectional dependence
is NOT allowed, i.e. constants are constrained to be equal across waves. 
This is equivalent to no effect of time. This option sometimes 
causes convergence problems.{p_end}

{phang}
{opt errorinv} constrains error variances to be equal across waves. May cause convergence problems{p_end}

{phang}
{opt re} Random Effects Model (Alphas uncorrelated with Xs){p_end}

{dlgtab:Reporting Options}

{phang} 
{opt title(string)} Gives a title to the analysis. This title will appear in both the
highlights results and (if requested) the Mplus code. For example, {it:ti(Baseline Model)}
{p_end}

{phang} 
{opt details} This will show all the output generated by the sem command. Otherwise only a
highlights version is presented. This can be useful if you want to make sure the model
specification is correct or if you want information not contained in the highlights.
{p_end}

{phang} 
{opt showcmd} This will show the sem command generated by xtdpdml. This can be useful to
make sure the estimated model is what you wanted.
{p_end}

{phang} 
{opt gof} Reports several goodness of fit measures after model estimation. It has the
same effect as running the sem postestimation command {cmd:estat gof, stats(all)} 
after xtdpdml.
{p_end}

{phang} 
{opt tsoff} By default, when possible the highlights output produced by
xtdpdml will use time-series notation similar to what you see with
commands like xtabond, e.g. L3.xvar will represent the lag 3 value of
xvar. Since the data are reshaped wide, this is not the same as the name
of the variable that was actually used, e.g. it might be that L3.xvar
corresponds to xvar2. tsoff will turn off the use of time series
notation in the highlights printout and show the names of the variables
actually used in the reshaped wide data.
{p_end}

INCLUDE help displayopts_list

{phang} 
{opt coeflegend} Display the legend instead of the statistics. This can be useful if, say,
you are trying to use post-estimation test commands to test hypotheses about effects.
{p_end}

{dlgtab:Other Options}

{phang} 
{opt mplus(filenamestub, mplus options)} This will create inp and data
files that can be used by Mplus (has only been tested with Mplus 7.4).
This is adapted (with permission) from UCLA's and Michael Mitchell's
stata2mplus command but does not require that it be installed. The
filenamestub must be specified; it will be used to name the Mplus .inp
and .dat files. Everything else is optional. Options {opt r:eplace},
{opt mi:ssing(#)}, {opt listw:ise}, {opt a:nalysis}, and {opt out:put}
are supported. {opt replace} will cause existing .inp and .dat files
to be overwritten. {opt missing} specifies the missing value for all
variables; default is -9999. {opt listwise} will cause listwise
deletion to be used rather than fiml. {opt analysis} and {opt output}
specify options to be passed to the Mplus analysis and output options.
As is the case in Mplus, multiple analysis and output options should
be separated by semicolons. xtdpdml cannot check your Mplus syntax so
be careful.

{phang}
So, for example, if the user specified 
{cmd:mplus(myfile, r missing(-999999) analysis(iterations = 2000) out(mod(3.84); sampstat))} 
myfile.inp and myfile.dat would be created
(replacing any existing files by those names). All missing values
would be set to -999999. The Mplus analysis option would set iterations equal to 2000
(default is 1,000). The output option (note how ; was used to separate the two options requested)
would request that modification indices > 3.84
be printed out and that sample statistics be included in the output. Obviously you
need to understand Mplus to use the analysis and output options; if you don't use them
the default values will probably meet most of your needs. You can, of course, edit the .inp
file on your own before running Mplus.

{phang}
Include the {opt dryrun} option if you only want the mplus code.
Keep in mind that Mplus only shows the first 8 characters of variable
names; also since data are reshaped wide the names of time-varying variables should be 7
characters or less (or 6 characters or less if T > 10) if you want to see
the full variable name in the output. Some editing of the .inp file
may be required first, e.g. variable names may need to be shortened and/or long lines may have to
be split. Mplus automatically uses fiml regardless of whether you have
asked for it or not; this can be overridden with the mplus {opt listwise} option. 
Most/all xdpdml model specification options 
are supported but the user should still check the coding. 
{p_end}

{phang} 
{opt semfile(filename, r)} The generated sem commands will be output to a file
called filename.do. The r option can be specified to replace an existing do file
by that name. This is useful if you want to try to modify the sem commands
in ways that are not easily done with xtdpdml. You may wish to also specify
the {opt staywide} option so that data remain correctly formatted for use
with the generated do file.
{p_end}

{phang} 
{opt dryrun} This will keep sem from actually being executed. This will catch some
errors immediately and can be useful
if you want to see the sem command that is generated and/or wish to specify
{it:staywide} to reformat the data from long to wide. This will often
be combined with the {it:showcmd}, {it:mplus}, {it:semfile}, or {it:staywide} options.
{p_end}

{phang} 
{opt iterate(#)} Maximum number of iterations allowed. Default
is 250. You can increase this number and/or change the maximization
technique if the model is having trouble converging. 
{p_end}

{phang}
{opt technique(methods)} Maximization techniques used. Default is
{it:technique(nr 25 bhhh 25)}. You can change this if the model is
having trouble converging. See {help maximize} for details as well as
for information on other options that can be used, e.g. {it:difficult}.
{p_end}

{phang}
{opt semopts(otions)} Other options allowed by sem will be included in the generated sem command.
See, for example, {help sem_reporting_options}.
{p_end}

{phang}
{opt fiml} Full Information Maximum Likelihood is used for missing data. This is the equivalent of specifying 
method(mlmv) on the sem command. Use of fiml sometimes dramatically slows down execution so be patient
if you use it!
{p_end}

{phang} {opt skipcfatransform} and {opt skipconditional} - Stata 14.2
changed the way starting values are computed in sem. When used
together, {opt skipcfatransform} and {opt skipconditional} cause Stata
to compute starting values the same way as it did before Stata 14.2.
Usually the new procedures work better, especially when fiml is used,
but sometimes the old start values speed up execution and/or are
better for getting models to converge. These options are ignored in
Stata 14.1 or earlier. Experience suggests that {opt skipcfatransform}
is often enough but sometimes both options may help. {p_end}

{phang}
{opt v12} xtdpdml was written and tested using Stata 13. The v12 option will also cause it to run under
Stata 12.1. This has not been extensively tested so use at your own risk.
{p_end}

{marker "Special Topics"}{...}
{title:Special Topics}

{dlgtab:Interactions with Time}

{pstd}Users sometimes want constants and variable effects to differ across
time. xtdpdml can do this but, because data are reshaped wide, the procedure
is different than it is with other programs.

{pstd}By default, xtdpdml lets the constants differ across time periods. In other
programs this would be like including i.time in the model. The {opt constinv} or
{opt nocsd} options can be specified if the user wants the constants to 
be invariant across time. Note that using these options will sometimes cause 
convergence problems.

{pstd}In other situations the user might want interactions with time where the
effect of a variable is free to differ across time periods. In other programs
this might be accomplished by specifying something like i.time#c.ses. With
xtdpml you use the free options instead, e.g. {opt xfree(ses)} will allow the
effect of ses to differ at each time period.

{dlgtab:Convergence Problems}

{pstd}xtdpdml sometimes has trouble converging to a solution. Here are some
things you can try when that happens. 

{pstd}xtdpdml works best when panels are strongly balanced, T is small (e.g.
less than 10), and there is no missing data. If these conditions do not apply
to your data, consider doing the following.

{p 6 6 2} The {it:fiml} option will often help when some data are missing.

{p 6 6 2} Consider restricting your data to a smaller range of time periods
where most or all cases have complete data. See the example using the abdata
given below. Or, you might consider using only every kth year, e.g. 1980, 1985,
1990, ..., 2015. Using fewer variables in the model may also help.

{p 6 6 2} Consider rescaling variables, e.g. measure income in thousands of 
dollars rather than in dollars. This can help with numerical precision problems.
The {opt std} option makes rescaling and standardizing variables easy, 
although it may make coefficients a little harder to interpret. If {opt std}
solves a convergence problem then you may want to rescale the variables
yourself in a more interpretable way.

{p 6 6 2} Stata 14.2 changed the way start values are computed. Our
experience is that models using fiml tend to run far more quickly now.
However, sometimes the new start values actually make the models run
more slowly or cause convergence problems. If you are running Stata
14.2 or later, you can add the options {opt skipcfatransform} and/or
{opt skipconditional} to make Stata use the old starting values
methods.

{p 6 6 2} Mplus sometimes succeeds when Stata has problems and is often much
faster. Try the {opt mplus} option if you have access to the program.

{p 6 6 2} Finally, remember that problems with regressing Y on lagged
Y are not that severe when T and/or N is large. Methods like xtreg may
meet your needs in such situations. But even then, features like fiml
and time-invariant independent variables may make it worth your while
to pair your dataset down so you can do at least some analyses with
xtdpdml.

{pstd}There are several other options you can try if you are having problems
achieving convergence. Much of this advice applies to many programs, 
not just xtdpdml.

{p 6 6 2} The {it:difficult} option will sometimes work miracles. 
There is no guarantee it will work but it is very easy to try.

{p 6 6 2} The {it:technique} option can be specified to use different maximization
techniques. See the help for {help maximize}. 

{p 6 6 2} {opt evars} sometimes helps with convergence when there are no
predetermined variables in the model. It is an alternative and usually
less efficient way of specifying the error terms. But sometimes it helps
and may be necessary for replicating results from earlier versions of
xtdpdml.

{p 6 6 2} The {it:iterate} option can be used to increase or decrease the number
of iterations xtdpdml tries before giving up. The {it:details} option will
show the iteration log. You can increase or decrease the number of iterations
depending on whether it appears the program is converging to a solution.


{marker examples}{...}
{title:Examples}

{pstd}Data setup. Data should be xtset first with both panel id and time variable specified.
Run these commands before trying the other examples.{p_end}
{phang2}{cmd}
use http://www3.nd.edu/~rwilliam/statafiles/wages, clear{p_end}
{phang2}xtset id t{p_end}
{txt}

{pstd}Lag 1 for the y, strictly exogenous and pretermined variables, 
and a time-invariant variable{p_end}
{phang2}{cmd}xtdpdml wks L.lwage, inv(ed) pre(L.union) ti(Baseline Model) show{p_end}{txt}

{pstd}Same as above, writing out the equivalent sem code.{p_end}
{phang2}{cmd}preserve{p_end}
{phang2}keep wks lwage union ed id t{p_end}
{phang2}reshape wide wks lwage union, i(id) j(t){p_end}
{phang2}sem 	(wks2 <- wks1@b1 lwage1@b2 union1@b3 ed@b4 Alpha@1 E2@1 ) ///{p_end}
{phang2}	(wks3 <- wks2@b1 lwage2@b2 union2@b3 ed@b4 Alpha@1 E3@1) ///{p_end}
{phang2}	(wks4 <- wks3@b1 lwage3@b2 union3@b3 ed@b4 Alpha@1 E4@1) ///{p_end}
{phang2}	(wks5 <- wks4@b1 lwage4@b2 union4@b3 ed@b4 Alpha@1 E5@1) ///{p_end}
{phang2}	(wks6 <- wks5@b1 lwage5@b2 union5@b3 ed@b4 Alpha@1 E6@1) ///{p_end}
{phang2}	(wks7 <- wks6@b1 lwage6@b2 union6@b3 ed@b4 Alpha@1), ///{p_end}
{phang2}	var(e.wks2@0 e.wks3@0 e.wks4@0 e.wks5@0 e.wks6@0) var(Alpha) ///{p_end}
{phang2}	cov(Alpha*(ed)@0) cov(Alpha*(E2 E3 E4 E5 E6)@0) /// {p_end}
{phang2}	cov(_OEx*(E2 E3 E4 E5 E6)@0) cov(E2*(E3 E4 E5 E6)@0) ///{p_end}
{phang2}	cov(E3*(E4 E5 E6)@0) cov(E4*(E5 E6)@0) cov(E5*(E6)@0) ///{p_end}
{phang2}	cov(union3*(E2)) cov(union4*(E2 E3)) cov(union5*(E2 E3 E4)) ///{p_end}
{phang2}	cov(union6*(E2 E3 E4 E5)) ///{p_end}
{phang2}	iterate(250) technique(nr 25 bhhh 25) noxconditional{p_end}
{phang2}restore{p_end}{txt}

{pstd}Lags 0 and 1 of union are included as independent variables.{p_end}
{phang2}{cmd}xtdpdml wks L.lwage, inv(ed) pre(L(0 1).union) ti(Baseline Model + lag 0 of union){p_end}{txt}

{pstd}No lag on Xs{p_end}
{phang2}{cmd}xtdpdml wks lwage, inv(ed) pre(union) {p_end}{txt}

{pstd}No lagged ys included in the model{p_end}
{phang2}{cmd}xtdpdml wks L.lwage, inv(ed) pre(L.union) ylag(0){p_end}{txt}

{pstd}xfree and yfree options -- All lagged Ys and Xs effects free to vary across time.
This is how you allow for interactions with time.{p_end}
{phang2}{cmd}xtdpdml wks L.lwage, inv(ed) pre(L.union) ti(Baseline Model) {p_end}
{phang2}est store m1{p_end}
{phang2}xtdpdml wks L.lwage, inv(ed) pre(L.union) yfree xfree ti(Baseline Model + yfree xfree){p_end}
{phang2}est store m2{p_end}
{phang2}lrtest m1 m2, stats{p_end}{txt}

{pstd}Postestimation commands. Many/most sem postestimation commands work with xtdpdml.
For some commands it may be necessary to specify the staywide option so the data set is
properly formatted. In the following examples we get several goodness of
fit measures. We also replay all the results using 99% confidence levels.{p_end}
{phang2}{cmd}xtdpdml wks L.lwage, inv(ed) pre(L.union) {p_end}
{phang2}estat gof, stats(all){p_end}
{phang2}sem, l(99) nocnsr {p_end}{txt}

{pstd}Missing data. The fiml (Full Information Maximum Likelihood) option can be very effective
for dealing with data that are missing on a random basis. It is generally much 
easier to use fiml than it is to use multiple imputation. {p_end}
{phang2}{cmd}* Results with no missing data -- provides a baseline for{p_end}
{phang2}* assessing how well fiml works.{p_end}
{phang2}xtdpdml wks L.lwage, inv(ed) pre(L.union) ti(Baseline with no missing data){p_end}
{phang2}* Now we randomly create MD since there is none. But normally you{p_end}
{phang2}* would not do this!{p_end}
{phang2}replace union = . if _n/10 == int(_n/10){p_end}
{phang2}* fiml not used -- 60% of cases lost, estimates are quite a bit off.{p_end}
{phang2}xtdpdml wks L.lwage, inv(ed) pre(L.union) ti(Baseline with missing data, no fiml){p_end}
{phang2}* fiml used -- works extremely well, at least in this case{p_end}
{phang2}xtdpdml wks L.lwage, inv(ed) pre(L.union) fiml ti (Baseline with missing data, using fiml)
{p_end}{txt}

{pstd}Bollen and Brand (2010) replication. In their 2010 Social Forces paper,
Bollen and Brand present a series of Panel Models with Random and Fixed Effects.
Many, perhaps all, of their models can be easily replicated with xtdpdml (although
hand tweaking of the code may be required in a few cases). Sometimes xtdpdml yields
a modestly different model chi-square value than what they reported but we believe the xtdpdml value is
the correct one. Here we present the fixed effects model 2 from their Table 3.{p_end}
{phang2}{cmd}* Bollen & Brand Social Forces 2010 Fixed Effects Table 3 Model 2 p. 15 {p_end}
{phang2}use http://www3.nd.edu/~rwilliam/statafiles/bollenbrand, clear {p_end}
{phang2}xtdpdml lnwg hchild marr div, ylag(0) fiml tfix errorinv gof {p_end}
{txt}

{pstd}Comparisons with xtabond -- coefficients similar, xtdpdml tends to
be more significant. Include time dummies in xtabond since constants are
free to vary across time (by default) in xtdpdml. Alternatively you
could leave the time dummies out of xtabond and use constinv option with
xtdpdml. tfix option is necessary since year is coded in years rather
than t = 1, 2, ..., T. The evars option is used to help with convergence
and to replicate results from earlier versions of the program. The A/B
data are very unbalanced so we restrict the analysis to a shorter time frame.
{p_end}
{phang2}{cmd}webuse abdata, clear{p_end}
{phang2}keep if year >=1978 & year <= 1982{p_end}
{phang2}xtabond n l(0/1).w l(0/2).(k ys) yr1976-yr1984, lags(2){p_end}
{phang2}xtdpdml n l(0/1).w l(0/2).(k ys) , ylags(1 2) tfix evars ti(A/B data 1978 - 1982 Only){p_end}{txt}

{pstd}Create files for Mplus -- 
This will create Mplus .dat and .inp files but some editing may be necessary.
Files are written to the current directory so make sure it is
writing to the directory you want. The following will create m1.dat and
m1.inp, replacing any existing files by those names. dryrun will keep
Stata from actually estimating the model, which can be a good idea if you only
want the Mplus files. The Mplus output will 
include the Modification Indices and the descriptive sample statistics. Be sure
to use semicolons if you have multiple options for either analysis or output.
{p_end}{cmd}
{phang2}use http://www3.nd.edu/~rwilliam/statafiles/wages, clear{p_end}
{phang2}xtset id t{p_end}
{phang2}xtdpdml wks L.lwage, inv(ed) pre(L.union) dryrun ti(Baseline Model) mplus(m1, r out(mod; sampstat)){p_end}
{phang2}* View or edit the mplus .inp file if you want {p_end}
{phang2}doedit m1.inp {p_end}
{phang2}* Run mplus if you want to. Mplus mut be installed! {p_end}
{phang2}* The correct command may depend on your OS and your computer setup. {p_end}
{phang2}!mplus m1.inp {p_end}
{phang2}* View or edit the mplus output file if you want {p_end}
{phang2}doedit m1.out {p_end}
{txt}

{pstd}Generate a sem do file -- You can output the generated sem commands
to a do file. This may be useful if you want to modify the commands in ways
not easily done with xtdpdml. In this example the file
mytry.do is created and (because the r option is specified) 
any existing file by that name is
overwritten. The staywide option keeps the data in the wide format
that is required by sem. {p_end}{cmd}
{phang2}use http://www3.nd.edu/~rwilliam/statafiles/wages, clear{p_end}
{phang2}xtset id t{p_end}
{phang2}xtdpdml wks L.lwage, inv(ed) pre(L.union) staywide semfile(mytry, r){p_end}
{txt}

{marker authors}{...}
{title:Authors}

{p 5 5}
Richard Williams, University of Notre Dame, Department of Sociology{break}
Paul Allison, University of Pennsylvania, Department of Sociology{break}
Enrique Moral Benito, Banco de Espana, Madrid {break}
Support: Richard.A.Williams.5@ND.Edu{break}
Web Page: {browse "http://www3.nd.edu/~rwilliam/dynamic/index.html"}{break}

{marker acknowledgments}{...}
{title:Acknowledgments}

{p 5 5} Ken Bollen and Jennie Brand graciously provided us with the
data from their 2010 Social Forces paper to use in our examples. UCLA
and Michael Mitchell kindly allowed us to take their stata2mplus
program  and adapt it for our purposes. Code from Mead Over's linewrap
program was modified for use with the semfile option. William Lisowski
and Clyde Schechter provided comments that improved program coding.
Paul von Hippel offered helpful comments on the program's
documentation. Kristin MacDonald and other Stata Corp staff were very
helpful in modifying Stata so that sem and xtdpdml would execute much
more quickly.

{marker references}{...}
{title:References}

{p 5 5} Moral-Benito, Enrique, Paul Allison and Richard Williams. 2016
(in progress). "Dynamic Panel Data Modeling using Maximum Likelihood:
An Alternative to Arellano-Bond." This currently is the main working
paper using and explaining the xtdpdml method and program. 
{browse "http://www3.nd.edu/~rwilliam/dynamic/Benito_Allison_Williams.pdf"}{break}

{p 5 5}Williams, Richard, Paul Allison and Enrique Moral-Benito. 2015.
"Linear Dynamic Panel-Data Estimation using Maximum Likelihood and
Structural Equation Modeling". Presented July 30, 2015 at the 2015 Stata
Users Conference in Columbus, Ohio. 
{browse "http://www3.nd.edu/~rwilliam/dynamic/xtdpdml_Stata2015.pdf"}{break}

{p 5 5}Allison, Paul. 2015. "Don't Put Lagged Dependent Variables in Mixed Models."
{break}
{browse "http://statisticalhorizons.com/lagged-dependent-variables"}

{p 5 5}Moral-Benito, Enrique. 2013. "Likelihood-based Estimation of
Dynamic Panels with Predetermined Regressors." Journal of Business and
Economic Statistics 31:4, 451-472.

{p 5 5}Bollen, Kenneth, and Jennie Brand. 2010. "A General Panel Model with Random
and Fixed Effects: A Structural Equations Approach." Social Forces 89:1, 1-34.

{marker "suggested citation"}{...}

{title:Suggested citations if using {cmd:xtdpdml} in published work }

{p 5 5}{cmd:xtdpdml} is not an official Stata command. It is a free
contribution to the research community, like a paper. Please cite it
as such. For now, the suggested citations are

{p 5 5}Williams, Richard, Paul Allison and Enrique Moral-Benito. 2015.
"Linear Dynamic Panel-Data Estimation using Maximum Likelihood and
Structural Equation Modeling". Presented July 30, 2015 at the 2015 Stata
Users Conference in Columbus, Ohio.
{browse "http://www3.nd.edu/~rwilliam/dynamic/xtdpdml_Stata2015.pdf"}{break}

{p 5 5} Moral-Benito, Enrique, Paul Allison and Richard Williams. 2016
(in progress). "Dynamic Panel Data Modeling using Maximum Likelihood:
An Alternative to Arellano-Bond."  
{browse "http://www3.nd.edu/~rwilliam/dynamic/Benito_Allison_Williams.pdf"}{break}
